.. _`One-Hot Encoder`:

.. _`org.sysess.sympathy.machinelearning.one_hot_encoder`:

One-Hot Encoder
```````````````

.. image:: label_binarizer.svg
   :width: 48


Encode categorical integer features using a one-hot aka one-of-K scheme.

For each categorical input feature, a number of output features will
be given of which exactly one is marked as true and the rest as
false. This encoding is needed for feeding categorical data to many
scikit-learn estimators, notably linear models and SVMs with the
standard kernels. Note: a one-hot encoding of y labels should use a
LabelBinarizer instead. Also note: categories for the input data are
generated automatically (as in category='auto' keyword in scikit-learn)


Documentation
:::::::::::::

Attributes
==========

    **active_features_**


    **categories_**
        The categories of each feature determined during fitting
        (in order of the features in X and corresponding with the output
        of ``transform``). This includes the category specified in ``drop``
        (if any).


    **feature_indices_**


    **n_values_**


Definition
::::::::::


Output ports
============

    **model**  model
        Model


Configuration
=============

    **Handle unknown** (handle_unknown)
        How to handle unknown categories during (non-fit) transform
    **Transformed array in sparse format** (sparse)
        Will generate sparse matrix if true.
        Warning: sparse matrices are not handled by all Sympathy nodes and may be
        silently converted to non-sparse arrays


Implementation
==============

.. automodule:: node_preprocessing
    :noindex:

.. class:: OneHotEncoder
    :noindex: